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Abstract 

We investigate an application of the Tukey’s methodology in Theil’s regression to obtain a con¬ 
fidence interval for the true slope in the straight line regression model with not necessarily normal 
errors. This specific approach is implemented since 2005 in a package of the software R; however, 
without any theoretical background. We illustrate by Monte Carlo simulations, that this methodol¬ 
ogy, unlike the classical Theil’s approach based on Kendall’s tau, seriously deflates the true confidence 
level of the resulting interval. We provide also rigorous proofs in case of four data points (in general) 
and in case of five data points (under some additional conditions); together with a real life methods 
usage example in the latter case. Summing up, we demonstrate that one should never combine sta¬ 
tistical methods without checking the assumptions of their usage and we also give a warning to the 
already wide community of R users of Theil’s regression from various fields of science. 
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1 Introduction 

The Theil’s regression (sometimes referred to as Theil-Sen regression) is a robust non-parametric replace¬ 
ment of the traditional least squares approach to the straight line regression model Y = (]q + Pix + e 
and also to some more complex linear regression models (the pioneering papers were Theil 1950a,b,c). 
The Theil’s methodology does not require normality of the random errors e, while being able to provide 
parameter estimates, tests of linear hypotheses about the parameters, as well as confidence intervals for 
the parameters (see e.g. Hollander and Wolfe 1999 for a detailed description of the methods). 

We focus on the confidence interval (Cl) for the true slope /3i. For the software R (R Development 
Core Team 2010) there exists a package called mblm (Komsta 2013) that includes many tools of Theil’s 
regression. But, surprisingly, when asked for a Cl for /3i, the package does not compute the classical 
Theil’s Cl for /3i, proposed already in Theil (1950a) and making use of the theory of Kendall’s tau. 
Instead, the package uses a different approach that utilizes without any reference the well-known Cl 
based on the Wilcoxon’s signed rank test. In general setting, the Cl based on the Wilcoxon’s signed 
rank test has been ascribed to John Tukey (see Hollander and Wolfe 1999 for historical details), who 
originally developed it to obtain a Cl for the true center of symmetry of a symmetric distribution from 
which we observed a sample of independent and identically distributed data. However, it turns out very 
quickly (see Section 4), that in case of slope estimation in Theil’s regression the input data are definitely 
not independent. Therefore, the true confidence level of the resulting interval provided by the package 
mblm is of question and our paper shows that this negative premonition turns real. We think that it is 
important to point out and study this issue; and not just for theoretical reasons, i.e. to demonstrate that 
even a tempting combination of some proven statistical methods may have disastrous consequences if one 
ignores the assumptions of their usage. Our practical motivation was to alert the community of applied 
statisticians, because Theil’s regression (for its simplicity) and the corresponding R package mblm (for 
its availability) are rather popular among researchers who apply statistics in other sciences - see e.g. 
Logan (2011), a textbook for biologists that recommends the package mblm, or this sample of papers 
reporting the use of the package mblm in microbiology, genetics, chemistry, ecology, forestry, agriculture, 
hydrology, meteorology and also in behavior analysis: Hunter et al (2012), Carothers et al (2010), Kumari 
et al (2012), Denys et al (2012), Hunter et al (2013), Lucas et al (2013), Sardans and Pehuelas (2015), 
Pocewicz et al (2007), Heiskanen et al (2011, 2012), Mueller et al (2014), Arpaci et al (2013), Eastaugh 
et al (2012), Barroso et al (2015), Cuevas et al (2010), Zottele et al (2010), Puertas Orozco et al (2011), 
Vannest et al (2013). 

Our paper is organized as follows. Section 2 defines the underlying model. In Sections 3 and 4 we 
provide detailed description of the classical Theil’s Cl and the Cl based on the Tukey’s methodology, 
respectively; with an illustrative application of both methods on a real life dataset of n = 5 data points 
in Section 5. Section 6 shows by means of Monte Carlo simulations that the Cl based on the Tukey’s 
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methodology has its true confidence level under the nominal confidence level which is set to the traditional 
95% throughout the whole paper. We prove this observation rigorously in case of n = 4 data points 
(Section 8). In Section 7, under some additional conditions, we provide a proof also in the setting of the 
above-mentioned real life example, i.e. for n = 5 data points. For the sake of completeness we treat also 
the case of n = 3 data points in Section 9. Finally, in Section 10 we add some notes on the R package 
mblm implementation of the Cl for the true slope based on the Tukey’s methodology. Proofs of the 
theorems were deferred to the Appendix. 


2 The model 

For each of the n fixed and distinct points Xi,X 2 , ■ ■ ■ ,Xn (values of the predictor x) we observe the value 
of a random variable Y (response). We get a set of observations Yi,Y 2 ,... ,Yn, where Yi is the response 
at Xi- Without loss of generality we assume that a;i < a ;2 < • • • < a;„. Our linear model has the form 

Yi = Po + Pix^ + Si, t = l,2,...,n, 

where Po (intercept) and Pi (slope) are unknown parameters. Finally, the unobservable random errors 
ei,£ 2 , • • • ,£n are iid random variables from a continuous (not necessarily normal) distribution. 


3 Theil’s confidence interval for slope 


The hypothesis 


Ho: Pi = P* 


can be tested using Di = Yi — p*Xi = {pi — p*)xi + Po + Si- Provided that Hq is true, the DPs do not 
depend on the xPs, i.e. they do not correlate. Hence, the validity of Hq can be “measured” e.g. by the 
sample Kendall’s correlation coefficient 

_N,-Nd 
" (^) ’ 

where Nc is the number of concordant pairs (i.e. pairs of points [xi,Di] and [xj,Dj] such that {xi — 
Xj){Di — Dj) > 0) and is the number of discordant pairs (i.e. pairs of points [a;^, Di] and [xj^Dj] such 
that {xi — Xj){Di — Dj) < 0). The test statistic K = = N^ — Nd is known as the Kendall K statistic 

(see Hollander and Wolfe 1999). By ]K„ we denote its distribution under independence of the DPs from 
the xPs. The distribution ]K„ has been tabulated (see e.g. Hollander and Wolfe 1999) and implemented 
in many statistical softwares (see e.g. Wheeler 2009), because it depends just on the sample size n, but 
not on the distribution of the data. The distribution K„ is discrete, symmetric and has the support 








because K has the same parity as ( 2 ). A test of the hypothesis Hq at the significance level a is then 


reject Hq : Pi = P* , if \K\ > kn{aj2), 


where kn{o'/2) stands for the upper quantile of the distribution ]K„. It should be such an integer that, 
under Hq, P{K > fc„(a/2)) = a/2. However, due to the discrete nature of the distribution K„, an exact 
equality is virtually impossible. Therefore, we define kn{a/2) as such a unique integer with the same 
parity as ( 2 ) that 

P{K > kn{a/2)) < a/2 and P{K > fc„(a/2) - 2) > a/2. 


The consequence is that in general the true probability of the type I error of the above test is bellow the 
nominal significance level a, because it equals 2 • P{K > fc„(a/2)). 

Similar idea leads to a Cl for the true slope Pi. Denote 


o - Yj 

Oii — r 


(* < j) 


the “sample” slope of the line given by the pair of sample points [xi, Yp and [xj,Yj]. There are -^ = (2) = 
n{n — l)/2 such slopes. Order them ascendingly and denote the resulting sequence si < S 2 < • • • < sat - 



since the random Cj’s come from a continuous distribution, we may ignore ties between the sample slopes, 
because they will happen with zero probability, i.e. we shall assume that there are sharp inequalities 
between the Sj’s. The Theil’s 1 — a confidence interval for the true slope j3i is 


{si ) Su) , 


( 1 ) 


where 


N-kn{a/2) ^ ^ 


and „, 'V + W) , 


see e.g. Hollander and Wolfe (1999). Note that the indices I and u are symmetric in the sense that 
the same Cl is obtained also by taking the Cth slope from bellow a the Cth slope from above, since 
I + u = N + 1. The discrete nature of the distribution K„ involved causes the true confidence level of 
the above interval to be typically over 1 — a. The exact value is given by the following theorem. 


Theorem 1. For all k of the form N — 2i (is = 0,1,..., [N/4:\) put I = (iV —fc)/2 +1 and u= {N + k)/2. 
Then the true confidence level of the Theil-type Cl (si,s„) is 1 — 2 • P{K > k) where K is a random 
variable following K„. 


Theorem 1 implies that the true confidence level of the Theil’s Cl (1) equals 


l-2-P{K>kn{a/2)), 


( 2 ) 


which is at least 1 — a. 


4 An a la Tukey confidence interval for slope 

We start with a brief description of the Tukey’s methodology in a general setting (see e.g. Hollander and 
Wolfe 1999). 

Suppose we have some input iid random variables Zi, Z 2 , ■ ■ ■, Zjq coming from a continuous symmetric 
distribution. We compute the so-called Walsh averages {Zi + Zj)/2 {i < j). Now we order the Walsh 
averages ascendingly (due to the continuity of the underlying distribution, we may ignore ties) and denote 
the resulting set wi < ^2 < • • • < wp, where P = ('^) + N = N{N + l)/2. Then the Tukey’s 1 — a Cl 
for the center of symmetry of the true distribution of the Zfs will be 

{wL,wu), (3) 

where U = tN{ct/2). The value of L will be “symmetric” in the sense that L = P — t]s[{ct/2) + 1, i.e. 
one takes the tN{ce/2)-th Walsh average from bellow and the tAr(a/2)-th from above. Finally, tN{a/2) 
denotes the a/2 upper quantile of the null distribution of the Wilcoxon’s signed rank statistic T+ having 
the range 0,1, 2,..., P (see e.g. Hollander and Wolfe 1999 for details). Similarly as with the Theil’s Cl, 
the true confidence level of the Tukey’s Cl is typically strictly above 1 — a. 

Now, as the R package mblm does, we apply the Tukey’s methodology described above to obtain a Cl 
for the true slope fii in Theil’s regression. The role of the Zfs will be played by the set of the slopes of 
all lines given by all pairs of the data points, i.e. by the set {Sij-,i < j}. We name the resulting interval, 
i.e. the Tukey’s Cl based on the slopes Sij, the a la Tukey confidence interval. From a particular 
point of view, it seems to be a good idea to apply the Tukey’s approach on the sample slopes Sij, 
because the Tukey’s approach was bred for and is known to perform well in situations of symmetrically 
distributed data - and it is easy to see that the distributions of our Sij ’s are indeed symmetric aroud fix ! 
Unfortunately, one of the basic assumptions of the Tukey’s methodology is independence of the input 
data, i.e. independence of the Z/s. However, it is easily seen, that this assumption does not hold for 
the slopes Sij. Actually, there is functional dependence among the slopes, because, for example, the 
knowledge of S'!.!, S'!, 2 , • ■ •, enables us to compute the remaining S.fis, since 

Sij{xi - Xj) - Sii{xi - Xi) 

Oij — 

Xi — Xj 

This does not necessarily mean that the a la Tukey Cl does not provide at least the nominal confidence 
level. For example, it may happen that the nominal confidence level is preserved, just the interval is 
redundantly wide. The best scenario from the a la Tukey CFs point of view is that the interval provides 
the nominal level of confidence while being narrower than the classical Theil’s Cl described in Section 3. 
The real state of affairs will be presented after an illustrative example. 



5 A real life example 

Let us explain the above-described methods on real data analyzed in Hollander and Wolfe (1999) (Ex¬ 
amples 9.1-3) and also in Logan (2011) (Example 8 .G), where the following description can be found. 
Smith (1967) investigated the effects of cloud seeding on rainfall in the Snowy Mountains, Australia. The 
experiment took place in two areas — the target and the control. Within a year a number of periods 
were randomly allocated for seeding the target area and additional periods for non-seeding the target 
area. The total rainfalls Tgeeded, Lunseeded in the target and Cseeded, Gunseeded in the control area during 
the seeding and non-seeding periods were recorded. Within a single year, the impact of seeding was 
assessed via a double ratio Y = (Tseeded/(^seeded)/(Yunseeded/C'unseeded) and the experiment was repeated 
over n = 5 years (years denoted by x). The measurements are summarized in Table 1 and depicted on 
Fig. 1. 


Table 1: The cloud seeding experiment — the double ratio Vi measures the impact of cloud seeding on 
the rainfall in year Xi 


year Xi 1 2 3 4 5 

double ratio Y I 1.26 1.27 1.12 1.16 1.03 


Figure 1: The cloud seeding experiment — linear dependence of the double ratio Vi on time 


o 


CO _r 



1 2 3 4 5 


year x, 


We adopt the classical straight line regression model Vi = l3o+l3iXi + ei (z = 1, 2,..., 5). Logan (2011) 
states that “whilst there may not appear to be any evidence of non-normality..., it could be argued that 
there are too few observations on which to make meaningful decisions about normality (of the random 
errors) and it might be safer to not make distributional assumptions”. Therefore, the Theil’s regression 
in place of the classical least squares inference is applied. The ordered values of the .^ = ( 2 ) = 10 sample 
slopes Sij = (Vi — Vj)j{xi — Xj) are si < S 2 < • • • < siq: —.1500, —.1300, —.0800, —.0700, —.0575, 
— .0550, —.0450, —.0333, .0100, .0400. The dashed line on Fig. 1 shows the linear trend estimated by 
the Theil’s approach. Its slope is the median of the Ay’s, i.e. (—0.0575 -I- (—0.0550))/2 = —0.05625 and 
suggests a decreas of the double ratio over time, i.e. a decreas over time of the rainfall increases resulting 
from the seeding. 

Rather than a point estimate, our main concern are the 95% Cl’s for the true slope. Let us start 
with the Theil’s Cl (Section 3). For a = 5% the appropriate upper quantile ^ 5 ( 2 . 5 %) is 10 (see e.g. 
Table A.30 in Hollander and Wolfe 1999), I = (10 — 10)/2 -|- 1 = 1 and u = (10 -I- 10)/2 = 10. Hence, the 
resulting Cl (1) is (si, sio), i.e. it is given by the minimum and the maximum sample slope. Numerically, 

(si,sio) = (-0.15,0.04) 

and since the Cl contains zero, the negative trend suggested by the point estimate of the true slope does 
not seem to be significant. 

For the the a la Tukey Cl, the appropriate upper quantile fio(2.5%) is 47 (see e.g. Table A.4 in 
Hollander and Wolfe 1999). The ordered values of the P = 10- (10-1-1)/2 = 55 Walsh averages {si + Sj)/2 
(* < j) are wi < W 2 < ■ ■ ■ < W 55 : —.150, —.140,..., .040. Further, U = 47 and L = 55 — 47 -I- 1 = 9. 
Therefore, the resulting Cl (3) is {wg,W 4 r), i.e. it is given by the 9-th Walsh average from bellow and 
the 9-th Walsh average from above - which is the 47-th from bellow, since there are together 55 Walsh 




averages. For this dataset, 


(^ 9 ,^ 47 ) = (-0.100,-0.015) 

and since this Cl does not contain zero, it confirms the negative trend suggested by the point estimate 
of the true slope, i.e. a decrease over time of the rainfall increases resulting from the seeding. Note that, 
from this point of view, there is a discordance between the Theil’s Cl and the a la Tukey CL 

The a la Tukey Cl (—0.100, —0.015) is reported also in Logan (2011), because the book utilized the R 
package mblm. For this data, the a la Tukey Cl is much narrower then the rather conservative Theil’s Cl 
(—0.15,0.04). This is not just coincidence and in general the a la Tukey Cl is not to be trusted because 
in what follows we show that there is a confidence level issue with it. 


6 Monte Carlo study 

At the end of Section 4, a positive scenario was hypothesized, that the a la Tukey Cl could be narrower 
than the classical Theil’s CL This was supported also by the real life example about cloud seeding in 
Section 5. However, this benefit turns out to be worthless because there is a crucial problem with 
the a la Tukey Cl’s true confidence level, as can be seen from the results of a Monte Carlo study we 
have conducted. In Table 2 (except the last column; see below) we provide simulation estimates of the 
true confidence levels of the a la Tukey Cl for the true slope under various settings. The number of 
data points n changed from 6 to 200. The true values of intercept /3o and slope /3i were set to 0 and 1, 
respectively. The iid random errors ei were generated from the normal distribution N (0,0.01), the Cauchy 
distribution with location parameter 0 and scale parameter 0.1, or the uniform distribution on the interval 
(—0.2,0.2). The motivation for the scale parameters of the distributions was to make the spread of the 
Ej’s comparable with the spread of the Xi’s, i.e. to achieve that the data points [xi,Yi] do not produce 
an ideally straight line, nor resemble a shapeless data cloud. In the part “Evenly spaced Xj’s”, the Xi’s 
created an equidistant design on the interval (0,1), more precisely, Xi = — 1) for i = 1,2 ,... ,n. 

In the part “Two clusters of evenly spaced Xi’s’\ the design of the experiment consisted of two clusters 
of evenly spaced points on the subintervals (0,1/3) and (2/3,1), more precisely, Xi = {i — l)/(3(n/2 — 1)) 
for z = 1,2,... ,n/2 and Xi = 2/3 + (z — zz/2 — l)/(3(n/2 — 1)) for i = n/2 + l,rz/2 + 2,... ,n. Each 
figure in Table 2 (except the last column) is based on 10,000 simulations and it is the proportion of times 
(rounded to three decimal places) the a la Tukey Cl covered the true slope /3i. The nominal confidence 
level was set to 95%. 

Just for illustration, the rightmost column of Table 2 contains the true confidence levels of the 
Theil’s CL these figures are not based on simulations, they have been computed by (2) using the R 
package SuppDists (Wheeler 2009). Thanks to the distribution-free property of the Theil’s Cl, these 
true confidence levels depend just on the number of data n (i.e. not on the design of the x^’s, or on the 
underlying distribution of the random errors Ei) and are, of course, always at least as high as 95%. 

The main message of Table 2 is that, irrespective of the probability distribution of the random errors, 
the true confidence level of the a la Tukey Cl is strictly below the nominal 95% and decreases rapidly 
with increasing number of data n in all settings presented in our study. The design of the XiS does 
not seem to play an important role either; we tried also some other designs not reported here and the 
resulting figures were very similar. The Monte Carlo simulations support our suspicion that the method 
of construction of the a la Tukey Cl for the true slope is wrong. In what follows we provide rigorous 
treatment of the problem in case of rz = 5, rz = 4, and rz = 3 data points. 


7 The case of n = 5 data points 

We are going to examine the true confidence level of the a la Tukey Cl for sample size and nominal 
confidence level as in the real life example about cloud seeding in Section 5, i.e. in case of rz = 5 
data points and nominal confidence level I — a = 95%. In Section 5 it was derived in detail, that the 
corresponding Theil’s Cl (1) and the a la Tukey Cl (3) are (si,sio) and ( 109 , 1047 ), respectively. The 
following theorem shows their mutual relationship. 

Theorem 2. For n = 5, the 95% a la Tukey Cl ( 109 , 1047 ) is always a subset of the 95% Theil’s Cl 
(si,sio). 

Theorem 2 itself is not enough to claim that the a la Tukey Cl has its true confidence level under 
95%. However, by Monte Carlo simulations not reported here we noticed that the a la Tukey Cl (zcg, z(; 47 ) 



Table 2: True confidence levels of the 95% Theil’s Cl (computed numerically) and the 95% a la Tukey Cl 
(simulation estimates under various arrangements of the Xi’s and various distributions of random errors). 

a la Tukey Cl Theil’s Cl 

Evenly spaced Xi’s Two clusters of evenly spaced Xi’s 

Distribution of random errors: Distribution of random errors: 

Number of data n normal Cauchy uniform normal Cauchy uniform 

6 .869 .850 .867 .871 .855 .867 .983 

10 .804 .773 .793 .804 .777 .800 .953 

20 .679 .636 .675 .678 .646 .675 .953 

30 .591 .551 .588 .595 .561 .596 .951 

40 .533 .494 .540 .530 .499 .541 .952 

50 .486 .453 .489 .491 .456 .498 .950 

60 .449 .416 .451 .452 .424 .457 .950 

70 .425 .388 .427 .426 .396 .429 .951 

80 .399 .365 .402 .402 .373 .403 .951 

90 .379 .351 .376 .384 .361 .383 .950 

100 .357 .340 .364 .369 .343 .365 .950 

120 .329 .307 .329 .338 .310 .334 .950 

140 .311 .290 .310 .302 .293 .309 .950 

160 .294 .267 .295 .303 .278 .296 .950 

180 .276 .246 .278 .276 .251 .281 .950 

200 .261 .240 .265 .268 .249 .266 .950 

happens to be very often the subset of the even narrower Theil-type Cl ( 52 , 59 ). The true confidence 
level of ( 52 , 59 ) can be obtained easily: put / = 2, u = 9, n = 5, and N = 10, then the notation of 
Theorem 1 implies that k = 8 and the theorem itself gives the true confidence 1 — 2 • P{K > 8 ) which 
can be evaluated e.g. by Table A.30 in Hollander and Wolfe (1999). The approximate result is 91.67%. 
Therefore, our aim is to show, that the a la Tukey Cl is “very often” a subset of ( 52 , 59 ) with 

the poor confidence level of 91.67%. The consequence will be, that although the true confidence level of 
the a la Tukey Cl (^ 9 ,^ 47 ) could be over that of (s 2 , sg), it is definitely under 95%. The next theorem 
states exact conditions in terms of the SiS when the above-described desired “very often” inclusion of 
(wg,W 4 Y) in ( 52 , 59 ) happens, i.e. conditions when the lower (upper) bound of {wq,W 4 y) is over (under) 
the lower (upper) bound of ( 52 , 59 ). 

Theorem 3. If n = 5 then: a) The random event sg < wg oecurs if and only if 2 s 2 < 5 i -I- 59 . b) The 
random event W47 < sg occurs if and only if sg + 5io < 259 . 

The following theorem provides an upper bound for the true confidence level T’(/3i € (rug, 1 ^ 47 )) of 
the a la Tukey Cl {wg,W 47 ). 

Theorem 4. Let n = 5. Denote 

Pi = P{Pi e (S2,59) A 2 s2 < Si -I- Sg a 52 -I- SiQ < 2s9) 

P 2 = PiPl e (S2, Sio) A 2 s2 < Si -I- Sg a 2s 9 <52-1- Sio) 

P3 = P(/ 3 i e (51,59) A Si -I- Sg < 2 s2 a 52 -I- SiQ < 2 s9) 

P4 = P(/ 3 l G (si, Sio) A Si -I- Sg < 2 s2 A 2s 9 <52-1- Sio) 

Then P(/3i € (icg, ^ 47 )) < pi + pg + P 3 +P 4 - 

Let us discuss Theorem 4 in greater detail. The probability pi is simply the true confidence level of 
(s 2 , Sg) together with the probability of the above-discussed “very often” inclusion of (rug, 1 ^ 47 ) in (sg, sg) 
with the poor true confidence level 91.67%. This means that pi is at most 91.67%. The probabilities pg, 
P 3 , p4 are based on the true confidence levels of the slightly wider intervals (s 2 ,sio), ( 54 , 59 ), (si,sio) 
and these confidence levels can be high. However, by Theorem 3 it can be seen immediately that the 
probabilities P 2 , Ps, and p 4 reflect only cases when these intervals include also (iug,W 47 ) whereas (wg, W 47 ) 
is not completely included in (sg, sg). Monte Carlo simulations suggest that these cases are rare, therefore 
we hope that pg, ps, and p 4 turn out to be low. 









All we need to do is to determine the probabilities pi, p2, P3, and p4. Note that they are given just 
by the joint probability distribution of the ordered slopes Si, i.e. one does not need to examine the mnch 
more complicated joint probability distribution of the ordered Walsh averages Wi anymore. There are 10! 
possible orderings of the ten slopes S'!,2, S'!, 3, ..., >S'4,5. Denote Bi, B2, ■ ■ ■ , i?io! the appropriate random 
events, i.e. each Bi denotes an event that a particular ordering of the slopes happened. Then, pi (and 
similarly p2, ps, and P4) can be decomposed as 

10 ! 

Pi = Piif^l € (S 2 , Sg) A 2s 2 < Sl + Sg A Sg + Sio < 2sg} fl Bi). 

i=l 

Let Bi denote, for example, the ordering 

5'4^5 < < S'2,5 < < 82,3 < 82,4 < 81^3 < 81^4 < 83^4 < 81^2- ( 4 ) 


Since 


S^l = 


Yj - Y, 

Xi — Xj 



( 5 ) 


the 9 inequalities in (4) that define the ordering can be rewritten as 9 linear inequalities of the form 


ClSi + 0262 -i -h 0565 < Co, ( 6 ) 

where cq, ci, cg,. .., C 5 are constants depending on the Xi’s. Further, under Bi the conditions /3i € (sg, sg), 
2s2 < Si + sg and 2s9 > sg + sio can be also rewritten as 2+l+l=4 inequalities of the form (6), because 
under Bi, for example, si = /3i + sg = /3i + , etc. Therefore, 

X4 2-5 2-3 2-5 

10 ! 

P4=Y,P{Pi), 

2=1 


where P{Pi) is the probability that the random errors vector (ei, eg,..., es)^ appears in the 5-dimensional 
polytope Pi with faces given by the above-mentioned 9-1-4=13 linear inequalities of the form (6). 

Nevertheless, we still have to consider 10! = 3,628,800 polytopes and evaluation of P 2 , P 3 and P 4 
is going to quadruple this number. Fortunately, most of these polytopes are empty sets because the 
following theorem implies that a lot of the 10! possible orderings of the slopes are impossible. 

Theorem 5. Let a < b < c be three indices. Then the slope 8ac is neither the greatest nor the smallest 
of the trio 8ab, 8ac, and 8bc- 

An automatized computer inspection of the 10! possible orderings revealed quickly, that only 768 of them 
conform to Theorem 5, which means that we have to deal just with 768 polytopes to obtain a pi. At 
this point we have to set concrete values of the xfs, because the P(Fi)’s depend on them. We decided 
for the following. 

Condition 1. The Xi’s create an equidistant design Xi = i for i = 1,2,... ,n. 

As byproducts, this choice of the xfs has the following pleasant consequences that again reduce the 
amount of computations. 


Theorem 6. If Condition 1 holds and n = 5, then p 4 = 0. 

Theorem 7. If Conditions 1 holds and the distribution of the random errors is symmetric (the latter 
and fairly common assumption will be posed later), then p 2 = P 3 - 

Still, it is not easy to evaluate the P(Pi)’s under an arbitrary probability distribution of the random 
errors Ej. Therefore, we decided for the uniform distribution to make the evaluation easier: 


Condition 2. The probability distribution of the errors Si is uniform on the interval (—1,1). 


Under Condition 2 the probability distribution of the vector (ei, £ 2 ,. ■., Es)^ is uniform in the 5-dimen¬ 
sional cube with the vertices [±1, ±1, ±1, ±1, ±1] and edges of length 2. Therefore, the probabilities 
P(Pi) reduce to 


P{P^) 


V{Qi) 


25 



where V{-) denotes volume and the Qi’s are the intersections of the polytopes Pi (each given by a set of 
above-mentioned 13 linear inequalities) with the cube given by the 10 inequalities 

Si < 1 and £i > —1 (i = 1, 2,..., 5), 

which means that each Qi is again a polypote; given by 13 -I- 10 = 23 inequalities of the form (6). To 
evaluate the volumes of such polytopes we used a specialized software Vinci (Biieler and Enge 2003): for 
each polytope the 23 defining inequalities were passed to Vinci and the computation of the volume was 
then based on the triangulation of the polytope and computation of determinants. 

So we obtained pi and P 2 by evaluating the volumes of the above-mentioned 768 -I- 768 polytopes and 
by Theorems 6 and 7 we have 

Pi+P 2 +P 3 +P 4 = 0.8107315 -h 0.0595787 -h 0.0595787 + 0 = 92,98889%. 

This provides an upper bound for the true confidence level of the interval (wg,W 4 Y), i.e. the true 
confidence level is under the nominal level 95% (as a Monte Carlo estimate based on 1,000,000 simulations 
we obtained 87.9%). We note that Condition 1 is just technical, since we are able to evaluate pi-|-p 2 +P 3 + 
P 4 under any particular arrangement of the Xi’s. However, Condition 2 about the uniform distribution of 
the random errors is crucial, because it reduced our computation to evaluation of volumes of polytopes, 
which could be accomplished by Vinci. 

8 The case of n = 4 data points 

In case of n = 4 data points, the 95% a la Tukey Cl will be (wi,W 2 i) which is obviously the same as 
(si, se)- Its true confidence level can be evaluated by Theorem 1: put ^ = 1, m = 6, n = 4, TV = 6, obtain 
k = 6 and the theorem gives the confidence 1 — 2 ■ P{K > 6), which can be evaluated e.g. by Table A.30 
in Hollander and Wolfe (1999). The approximate result is 91.67% which is definitely under 95%, i.e. the 
a la Tukey Cl does not work correctly in this case either. Note that - unlike the case of n = 5 data 
points - the obtained result 91.67% holds in general, e.g. it is completely independent of the additional 
Conditions 1 or 2. 

The above paragraph also means that in case of n = 4 data points the Theil’s approach is unable to 
produce a 95% Cl, because the confidence level of (sijSe) (the widest Theil-type interval) is under 95%. 
From another point of view, the 95% Theil’s Cl cannot be produced, because ^4(2.5%) satisfying our 
definition of the upper quantile value does not exist. 


9 The case of n = 3 data points 

With just n = 3 data points at hand, the Theil’s approach breaks down, because ^ 3 ( 2 .5%) does not exist. 
The same happens to the 95% a la Tukey Cl, because the Tukey’s methodology does not work for such 
a low number of data and nominal confidence level of 95% (^ 3 ( 2 .5%) does not exist). 


10 An R implementation of the a la Tukey confidence interval 

As we already noted, the a la Tukey Cl for the true slope is implemented in the R package mblm, however, 
without any reference to a theoretical background. It is available in the CRAN package repository since 
2005, but since that time the package documentation has been just noting that the package does not 
implement the original Theil’s Cl based on Kendall’s tau and it is considered to be implemented in 
next version of the package. However, it has not been implemented till now (August 2016), despite the 
fact that already the third version of the package has been released. Nevertheless, the main problem is 
that the package does not provide any warning about the deflated true confidence level of the intervals 
produced. The only exceptions are the cases of n = 4 and n = 3 data points. In case of n = 4 data 
points the package mblm produces a correct warning message, that the requested confidence level is not 
achievable. However, careful inspection of the package code reveals that it is just a coincidence: in fact, 
the warning says nothing about the Cl for the true slope, because it has been invoked by the computation 
of a Cl for the true intercept (this Cl was not discussed in our paper). For n = 3 data points the package 
mblm produces an error message, however, as before the true reason for the message is a problem with 
the computation of a Cl for the true intercept. 



11 Conclusions 


We have shown by means of Monte Carlo simulations that the a la Tukey confidence interval for the true 
slope in the straight line regression model seems to be unable to achieve the nominal confidence level. 
The loss of interval’s confidence does not seem to depend too much on the design of the experiment or 
on the distribution of the random errors, but becomes very serious with increasing number of data - in 
all cases with over 160 data points we observed the true confidence level even under 30% instead of the 
nominal 95%. 

In case of n = 4 data points we easily obtained also the true confidence level of the a la Tukey 
confidence interval - the simplicity of the reasoning resulted from the fact that the lower and upper limit 
of the a la Tukey confidence interval turned out to be some of the original sample slopes. However, in 
case of n = 5 data points the situation was much more complicated: we were able to obtain only an 
upper bound for the true confidence level and we numerically evaluated this upper bound under the 
condition of uniformly distributed random errors. 

Theoretically, the process of evaluation of the above mentioned upper bound can be adopted to 
obtain the exact value of the true confidence level of the a la Tukey interval. However, already in case 
of n = 5 data points there are 55 Walsh averages given by the ten slopes Stj and, theoretically, these 
Walsh averages can be arranged in 55! permutations. These would result in the necessity to evaluate 
and sum volumes of as much as 55! « 1.27 • 10^^ polytopes - a very hard task from the numerical point 
of view. Similarly as in the evaluation of the Pi’s, many of these polytopes could be a priory shown to 
be of zero volume, but we decided to proceed in a different way: we estimated the true confidence level 
from above by terms not involving the Walsh averages and showed rather easily that this upper bound 
is strictly under 95%. 

A natural question arises, if the reasoning in case of n = 5 data points can be easily adopted or even 
generalized for larger n. Despite our effort we have not found any positive answer, because the situation 
complicates dramatically already for n = 6. 

The a la Tukey confidence interval for the true slope is implemented in the R package mblm without 
any warning about its deflated true confidence level. The results of our paper show that this functionality 
of the package (i.e. computation of the confidence interval for the true slope) should not be used, because 
it tends to provide too liberal interval estimates. We conclude that although the software R is of great 
help at a great variety of statistical analyses, one has to remember its startup message noting that it 
“comes with ABSOLUTELY NO WARRANTY”. 

Apart from the software issue, we provided a simple non-parametric example that an at first glance 
rather clever combination of some renown statistical methods (Theil’s slopes and Tukeys’s Cl in our 
case) may yield disastrous results, if one ignores the assumptions of their usage. 


Appendix 

Proof of Theorem 1 

Recall Nc, Nd and the hypothesis Hq from Section 3. In Theil (1950a) on p. 390, the true confidence 
level of the Theil-type Cl (s/, s„) is expressed as 

1 - 2 ■ PiNd 

where the probability is evaluated under Hq and the result holds even under a more general setting than 
discussed in our paper. Since K = Nc — Nd and Nc + Nd = N, we obtain that Nd = {N — K)/2 and 

- l) = 1-2-P{K > k). 


1 - 2 • P{Nd <l-l) = l-2-P 


N-K 


< 


N-k 


+ 1 


□ 


Proof of Theorem 2 

Since the smallest Walsh average wi is given by the smallest slope si as (si -|- si)/2 = si and the 
largest Walsh average is given by the largest slope Sio as (sio-l-Sio)/2 = SiO) we obtain si = < Wg 

and 104 ^ < W 55 = Sio- □ 

Proof of Theorem 3 

Part a): We prove the equivalent statement “wg < S 2 iff si -I- sg < 2s2”. Start with wg < S 2 and 
consider the 9 smallest Walsh averages wi < W 2 < ■ • ■ < wg. Each of them is of the form (si + Sj)/2 for 



some i < j and since S2 = (s2 + S2)/2, the assumption wg < sg means that 


^ (7) 

Because si < S 2 < • • • < sio, the sharp inequality (7) immediately implies, that i = 1 and the 9 smallest 
Walsh averages wi < wg < • ■ • < wg have to be of the form (si + si)/2 < (si + S2)/2 < • • • < (si + S9)/2. 
Therefore, the inequality wg < sg can be rewritten as (si + S9)/2 < (s 2 + •S2)/2 and the first part of the 
proof is complete. 

Now, start with si + S 9 < 2 s2, i.e. (si + sg)/2 < sg- Since the 8 Walsh averages (si + si)/2 < 
(si + S 2)/2 < • • • < (si + Ss)/2 are even smaller then (si + sg)/2, we see that there are at least 9 Walsh 
averages smaller than S 2 - Therefore, also the 9-th smallest Walsh averages, i.e. W 9 , is smaller than sg- 
Part b): Note that the proof of part a) is based on the natural ordering “the higher slope (or Walsh 
average), the higher index”. Using the reverse ordering “the higher slope (or Walsh average), the lower 
index” in the proof of part a), one obtains the “symmetric” counterpart of part a), which is part b). □ 

Proof of Theorem 4 

Split the whole probability space into these four disjoint random events: 

T : S2 < ■u:9 A Wj^T < Sg 

B : Sg < Wg A Sg < W 47 
C : Wg < Sg A W 47 < Sg 
D : Wg < Sg A sg < W47 

Denote by U the random event {/3i G {wg, W47)}. Note that the minimum and the maximum of all slopes 
Si and their Walsh averages Wi are si and sio, respectively. This implies that 

P{u nA) < p({/3i e (S 2 , S 9 )} n T) = pi, 

PiU nB) < p({/3i e (s 2 , sio)} n P) = pg, 

P{u n C) < P({/3i e (si, S 9 )} n C) = pg, 

P{U n P) < P({/3i e (si, sio)} n P) = P 4 , 

where the final equality in each row follows from Theorem 3. Hence, we obtain 

P{U) = (U^A)+ P(U r\B)+ P(U f^C)+ P(U P D) <PI+Pg+Pg+P4. 


□ 


Proof of Theorem 5 

By contradiction, let Sac be the greatest of Sab, Sac, Sbc - the case that Sac is the smallest can be 
treated analogously. By (5) and by noting that Xa < Xb < Xc, one observes that the inequality Sac > Sab 
is equivalent to 

(^a ^c)i^a ^b) A (^a ^b){^a ^c) 

and Sac > Sbc is equivalent to 

{Sa - ec)(xb - Xc) > {sb - ec){Xa - Xc). 

By summing these two inequalities we obtain (ea—Sc){xa—Xc) > (£a—£c)(xa—Xc) which is impossible. □ 

Proof of Theorem 6 

Theorem 5 implies that the minimum and maximum sample slopes si and sio are of the form si = 
Si^i+i and sio = Sjj+i for some distinct i and j from {1,2,3, 4}. Straightforward algebra implies that 
Sij — = (sio — si)/(* — j) under Condition 1, which means that 

sio ~ Si = \i — j\ • J- — Pi+iJ-+11. (8) 

Note that \i — j\ <3 (because 1 < t, j < 4) and if both Si^j and Si+ij+i belong to {s 2 , S 3 ,..., S 9 }, then 
one obtains from (8) that 

sio - Si < 3 (s9 - S 2 ). (9) 



However, summing the inequalities si + sg < 2 s 2 and 2 s 9 < sg + sio appearing in the definition of pi 
yields 

sio — Si > 3 (s9 — S2), 

which contradicts (9), i.e. pi = 0. 

It remains to treat the case when not both Sij and Si+ij+i belong to {sg, S 3 ,, sg}. This happens 
if and only if \i — j\ = 1. Without loss of generality, we will suppose that j = i + 1, i.e. si = Si^i+i, 
sio = Si+i^i +2 and i < 3. 

a) The case when i <2. We will show that the inequality 


2s 9 < S2 + Sio (10) 

appearing in the definition of p 4 is impossible. Because S 2 < Si+ 2 ,i+ 3 , S'i+iy+a < sg and sio = 5'i+i^i+2, 
the inequality (10) would imply that 2 S'i+i^i +3 < Si+2,i+3 + >5'i+i,i+2, which is equivalent to 0 < 0 under 
Condition 1. 

b) The case when i = 3. We will show that the inequality 

Si + Sg < 2 s2 (11) 

appearing in the definition ofp 4 is impossible. Because S 2 < 82 , 4 , S'2,3 < sg and si = S' 3 , 4 , the inequality 
(11) would imply that S '34 + 5 ' 2,3 < 2 , 82 , 4 , which is equivalent to 0 < 0 under Condition 1. □ 

Proof of Theorem 7 

Symmetry and independence of the distribution of the Si’s given by Condition 2, together with the 
equidistantness of the Xj’s given by Condition 1 means that moving from the Ej’s to the “equiprobable” 
— £i’s reverts the ordering of the sample slopes and also the ordering of their Walsh averages, because 
each sample slope changes symmetrically around /3i (cf. (5)). It means that, for example, the sample 
slope with the label sg gets the label sg, or the Walsh average with the label wg gets ^ 47 , etc. The 
relationships between the Si’s and Wj’s change accordingly: for example, sg < wg changes to W 47 < sg. 
Hence, we observe that the conditions defining pg change to conditions defining p^. □ 
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